feat: Add video frame format render option#1142
Conversation
jrusso1020
left a comment
There was a problem hiding this comment.
Approve ✅
Solid PR with a real root cause and good evidence. I reproduced the pipeline locally to sanity-check both the diagnosis and the chosen fix, and it holds up.
Verified the root cause
Synthesized a saturated-red UI element (RGB(221,56,46)) encoded as a typical h264/yuv420p screen recording, then extracted a frame the way the engine does (-vf fps -q:v 2):
| Extraction | Sampled pixel | maxΔ from source |
|---|---|---|
| JPEG-420 (current) | (207,40,48) | 16 |
| JPEG-444 (no chroma subsampling) | (207,40,47) | 16 |
| PNG (this PR) | (220,55,45) | 1 |
The shift is not chroma subsampling — JPEG-444 shifts the saturated red just as much as 420. It's intrinsic to JPEG's RGB→YUV→RGB roundtrip on saturated colors. PNG stays in RGB end-to-end, so it's the only thing that actually preserves the pixel. There is no cheaper JPEG-side workaround, which makes PNG the correct fix here.
Why this is the right shape
This is fundamentally a preview/render fidelity gap: Studio preview plays the source <video> directly in Chrome (RGB, no extraction), while render replaces it with pre-extracted frames before capture — so the JPEG shift is baked in before the encoder runs. That's exactly why touching final H.264 color tags can't fix it, as the description notes.
Keeping the default at auto is the right call. I benchmarked PNG-for-everything on a photographic 1080p frame: ~8x slower extraction and ~5x larger (a 60s 1080p30 render goes from ~1.4 GiB to ~6.8 GiB of intermediate frames), and on photographic content the JPEG shift is perceptually invisible. So a blanket PNG default would tax the common case heavily — and on Lambda that frame bloat risks blowing /tmp ephemeral storage outright, not just slowing things down. Opt-in is correct.
Suggested follow-up (not blocking)
The --video-frame-format flag is render-global, but the need is per-source: a composition mixing a UI screen-recording with photographic b-roll has to choose all-or-nothing. A per-<video> hint (e.g. data-frame-format="png", consistent with data-has-audio/data-start) would let authors opt the screen-recording clip into PNG while b-roll stays JPEG — closing the preview/render gap where it matters without the global cost. The flag is a good blunt override to keep alongside it.
Code is clean, threaded consistently through local/Docker/producer/distributed, cache key correctly incorporates the effective format so JPG/PNG caches can't collide, and the regression test is well-targeted. 👍
| requested?: VideoFrameFormat, | ||
| ): CacheFrameFormat { | ||
| if (metadata.hasAlpha || codecMayHaveAlpha(metadata.videoCodec)) return "png"; | ||
| if (requested === "png" || requested === "jpg") return requested; |
There was a problem hiding this comment.
Nice — moving the alpha/alpha-capable check ahead of requested isn't just cosmetic, it's a latent bugfix. Under the old order (if (requested) return requested), an explicit jpg would have stripped alpha from a transparent source. Now alpha always wins and jpg/png only override the opaque path, which is the correct precedence.
|
@xuelongmu can you rebase main and solve merge conficts? after that feel free to merge it! |
Summary
Adds a first-class render option for source-video frame extraction:
and the matching programmatic config field:
The default remains
auto, preserving the existing behavior: alpha or alpha-capable video sources extract as PNG, and opaque sources extract as JPG.Rationale
When compositing screen recordings, for example an iPhone UI recording, the timeline/browser preview can look close to the source while the rendered MP4 shifts saturated UI colors. Changing final H.264 color tags or range does not address this when the color shift has already happened earlier in the pipeline.
Pixel checks showed the shifted color was already present in HyperFrames' captured browser PNG frames. The root cause is that non-alpha source videos were extracted as JPEG frames before browser capture, which can visibly change saturated UI reds before final encoding ever runs. In one representative sample, the original phone recording frame sampled
RGB(221,56,46)on a saturated red UI indicator, while HyperFrames' browser PNG frame sampledRGB(186,51,58). A direct ffmpeg source-video composite preserved the same pixel closely atRGB(220,56,50), showing the final H.264 encoder can preserve the color when the source video does not pass through JPEG extraction.This PR adds an explicit, opt-in PNG extraction path for UI recordings, screen captures, and other color-sensitive source videos. It avoids an RGB lift or color-correction workaround and leaves final encoder defaults unchanged.
What Changed
--video-frame-format auto|jpg|pngtohyperframes render.RenderConfig.videoFrameFormat.resolveFrameFormatso:pngextracts opaque videos as PNGjpgextracts opaque videos as JPGauto/undefined preserves existing behaviorValidation
bun run --filter @hyperframes/engine test -- src/services/videoFrameExtractor.test.tsbun run --filter @hyperframes/cli test -- src/commands/render.test.ts src/utils/dockerRunArgs.test.tsbun run --filter @hyperframes/aws-lambda test -- src/sdk/validateConfig.test.tsbun run --filter @hyperframes/engine typecheckbun run --filter @hyperframes/producer typecheckbun run --filter @hyperframes/cli typecheckbun run --filter @hyperframes/aws-lambda typecheckbunx oxfmt --checkbunx oxlintThe new regression test synthesizes a tiny saturated-red UI fixture on a pale pink background and verifies PNG extraction keeps sampled red pixels within a max channel delta of 5 from the decoded source, while also proving JPG and PNG extraction caches remain separate.